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that the optimal n/v ratio for determining the number of factors by the 
standard error scree procedure depends on the characteristics of the data. A 
smaller n/v (7:1) ratio was needed when factor loadings were high and a 
larger ratio (14-22) was needed with low loading, particularly when factors 
were correlated. In all conditions, the n/v ratio for the SEscree procedure 
to correctly identify the true number of factors with high probability 
exceeded the minimum of 5:1 stated in some of the related literature. 
Furthermore, the use of logistic regression provided a model for analyzing 
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Abstract 

Logistic regression was used for modeling the observation-to-indicator ratio needed for 
the standard error scree procedure (SEscree) to correctly identify the number of factors existed in 
generated sample correlation matrices. The created correlation matrices were manipulated along 
the number of factors (4, 6), sample size (250, 500), magnitude of factor loadings (.5, .8), and 
degree of interfactor correlations (0, .4). Consequently the observation-to indicator (n/v) and the 
indicator-to-factor (v/f) ratios were also changed. The results indicated the optimal n/v ratio for 
determining the number of factors by the standard error scree procedure depends on the 
characteristics of the data. A smaller n/v (7:1) ratio was needed when factor loadings were high 
and larger ratio (14-22) was needed with low loading particularly when factors were correlated. 
In all conditions, the n/v ratio for the SEscree procedure to correctly identify the true number of 
factors with high probability exceeded the minimum of 5: 1 stated in some of the related 
literature. Furthermore, the use of logistic regression provided a model for analyzing data from 
complex simulation studies that makes it very easy to communicate otherwise very complicated 
relationships. 

Key Words: Logistic Regression, Standard Error Scree, Factor Analysis, Number of factors 
Hit Rate, Observation-to-Indicator Ratio 
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The number of observations needed to arrive at dependable conclusions has always been 
an issue in scientific research. The performance of most statistical methods is directly or 
indirectly affected by sample size because precision of estimate is largely affected by sample size 
(Pedhazor & Pedazhor, 1991). Approaches to the determination of sample size include general 
and specific formulas to given sampling designs or research (e.g., Cochran, 1977; Hays, 1988; 
Jaeger, 1984; Kirk’ 1982; Kish, 1965; Winer, 1971), tables (e.g., Rotton & Schonemann, 1978; 
Tiku, 1967), and power functions charts (e.g., Pearson & Hartley, 1951). A more extensive 
treatment is found in Cohen (1988) and in Kraemer and Theimann (1987). Cohen (1988) 
provided a detailed presentation of the elements of power analysis and illustrative application in 
diverse research contexts such as t-test, correlation, analysis of variance, and multiple regression. 
In the aforementioned treatments of sample size the number of indicators/predictors was not 
rigorously dealt with 

The determination of the sample size in relation to the number of indicators was also 
treated by several researchers and they have offered rules of thumb by which to determine the 
sample size. Ad hoc rules of thumb for statistical models such as multiple regression suggest the 
number of observations to the number of indicators ratio should be 10:1 to deal with problems of 
sampling variability and to ensure reasonable power (Tanaka, 1987). Huberty (1994), in the 
context of discriminant analysis, suggested that in order to estimate hit rates validly, the 
minimum number of observations in the smallest group should be at least three to five times the 
number of predictors, conditional on the type of discriminant analysis employed. 

In the context of factor analysis, despite the general agreement that large samples are 
imperative for stability of factor analytic results, there is no agreement as to what constitutes 
large. For example Cattell (1978) referred to samples below 200 as “smallish” (p. 492). Comrey 
(1978) recommended a sample of at least 200 observations, but he added that 2000 observation 
were needed to stabilize the factor structure. Several rules of thumb also have been suggested. 
Among these is Cattell’s (1952) 4:1 observations-to-indicator (n/v) rule. Nunnally (1978) 
suggested that “a good rule is to have at least 10 times as many subjects as variables” p. 421”. 
Gorsuch (1983) suggested five to ten observations per indicator, or several hundreds. Cliff 
(1987) offered rather looser guidelines. He stated “with 40 or so variables, a group of 150 
persons is about minimum, although 500 is preferable” p. 339). 
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Wolins (1982) rightly inscribed such rules of thumb as “incorrect” (p.64) solutions to the 
sample size, as it depends on specific objectives of the analysis and the characteristics of the 
data. According to Wolins (1982), the required sample size varies, depending, among other 
things, on the number of factors expected; whether or not it is “necessary to obtain good 
estimates of individual factor “ (p.64); whether or not the indicators are “well behaved” (p.64); 
and magnitudes of correlation among the indicators. 

There is no doubt that the situation is complex and cannot be resolved by simple answers 
and the suggestion to have as large a sample as feasible that frequently cited by researchers is not 
always useful. Thus, how large should the sample size be and what is the trade off among sample 
size, number of indicators, degree of loading and other characteristics of the data?. This study 
was performed to address this question.. 

Purpose of the study 

The purpose of the study was to determine the optimal (n/v) ratio needed to correctly 
identify the number of common factors by the standard error scree procedure. The standard error 
scree procedure was the optimal method for determining the number of factors among the 
regression-based variations of the visual scree examined in previous work (Nasser, 1997). 

In addition the study aimed to provide an example of using a modeling approach such as logistic 
regression to facilitate the interpretation of extensive and complex research results that are not 
easy to explicate otherwise. 

Method 

Design and data generation 

One hundred sample correlation matrices were generated from population correlation 
matrices using Kaiser and Dickman’s (1962) method. The standard error scree procedure (Zoski 
& Jurs, 1996) was used to determine the number of factors incorporated in the created correlation 
matrices. The created correlation matrices were manipulated along the number of factors (4, 6), 
sample size (250, 500), magnitude of factor loadings (.5, .8), and degree of interfactor 
correlations (0, .4). The design is not completely crossed (see Appendix A). Although the n/v 
was not an explicit variable in the design, it was changed by manipulating sample size and 
number of indicators. The levels of the manipulated variables were chosen to be sufficiently 
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different in order that their effect on the performance of the standard error scree procedure 
would be clear, and to keep the design to a manageable level. The correlation matrices were 
created based on the common factor model as proposed by Gorsuch (1983). The data generation 
was performed via IML/SAS procedure for PC, Release 6.08. 

Data analysis 

The number of factors determined by the standard error scree procedure for each of the 
100 samples under each condition were aggregated in one data file. The percentage of time the 
SEscree procedure indicated the true number of factors (hit rate) was computed. Then logistic 
regression was performed to obtain the predicted probabilities of determining the true number of 
factors. Predictive discriminant analysis is a useful alternative method to address the same 
question, However, some of the literature in which the two methods were contrasted indicate 
that although the two methods yielded similar results, logistic regression is based on fewer 
assumptions, is more robust with respect to violations of assumptions, is easier to interpret, and 
is more parsimonious where relevant (e.g., Aldrich & Nelson, 1984; Cleary & Angel, 1984; 
Dattalo, 1994; Shott, 1991). 

The dependent variable (Y) in the logistic regression was a dichotomous variable with a 
value of one when the standard error scree indicated the true number of factors and zero when the 
procedure indicated an incorrect number of factors. The predictor variable in the regression 
model was n/v ratio. Four logistic equations were obtained under four different combinations of 
the degree of factor loadings and interfactor correlation. In each of the four regression 
equations the n/v ratio was the predictor variable. 

A set of predicted probabilities were obtained by the regression equations under each of 
the four conditions. Each set of predicted probabilities was plotted against the n/v values 
included in the study and the optimal n/v ratio for determining the true number of factors under 
each condition was calculated. The actual probabilities of obtaining the correct number of 
factors by the standard error scree were plotted on the same ordinates to demonstrate the degree 
of match between the predicted probability by the model and the actual probability obtained by 
the standard error scree procedure. 
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Results 

Inspection of the hit rates in Table 1 indicated the sensitivity of the SEscree procedure to 
the n/v ratio was conditional on the degree of loadings, and to some extent on the degree of 

Insert Table 1 Here 

correlation among the factors. The hit rates are the actual probabilities of the SEscree to 
correctly identify the true number of factors. To facilitate the interpretation of the results in Table 
1 , four logistic equations were obtained under the four different combinations of degree of 
loadings and interfactor correlation and a summary of the logistic regression estimates are 
provided in Table 2.The statistically significant b's along with the large R 2 L , in particular when 

Insert Table 2 Here 



the loadings were .8, indicated that the n/v ratio is strongly related to the probability of the 
procedure indicating the correct number of factors. The strength of this relationship decreased 
when the loadings decreased, and especially so when the factors were correlated. 

The set of predicted probabilities obtained by the regression equations under each of the 
four conditions summarized in Table 2, was plotted against the n/v ratio values included in the 
study. The actual probabilities of obtaining the correct number of factors by the SEscree (see 
Table 1) were plotted on the same ordinates to demonstrate the degree of match between the 
predicted probability by the model and the actual probability obtained by the procedure. 

Figures 1-4 describe the relationship between the predicted and the actual probability to 
determine the correct number of factors by the SEscree procedure and the n/v ratio. Figure 1 and 
2 indicated that when the loadings were .8 and the factors were correlated or uncorrelated, seven 

Insert Figure 1 and 2 here 

observations per indicator were needed for the standard error procedure to have perfect 
probability of indicating the correct number of factors. Nonetheless, when the factors were 
uncorrelated predicted and actual probabilities completely matched, while a slight discrepancy 
between the actual and predicted probabilities was noticed when the factors were correlated. 

As indicated by Figure 3, with factor loadings of .5 and uncorrelated factors, at least 14 
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observations per indicator were required to insure that the predicted probability was higher than 
.8, and 16 observations per indicator were needed to obtain perfect predicted probability. 

Insert Figure 3 here 

When the factors were correlated at .4 and loadings were .5 (Figure 8), larger n/v ratios were 
required to obtain high predicted probability. At least 1 8 observations per indicator were needed 
to obtain a probability of .8 or higher, and 22 observations per indicator were needed to obtain 
perfect predicted probability. In the last two Figures the discrepancy between the actual 

Insert Figure 4 here 

and predicted probability was apparent, in particular in Figure 4. 

Discussion and conclusions 

Modeling the number the of observation per indicator by logistic regression results 
indicated that the optimal n/v ratio for determining the number of factors by the standard error 
scree procedure depends on the characteristics of the data. A smaller n/v ratio was needed when 
the factors were uncorrelated and factor loadings were high. In all conditions, the n/v ratio for the 
SEscree procedure to correctly identify the true number of factors with high probability exceeded 
the minimum of 5:1 stated in some of the related literature (e.g., Cattell, 1952; Cliff, 1987; 
Gorsuch, 1983 ). The results of the current study supported Wolins’ (1982) argument that the 
existing rules of thumb concerning the observation-to-indicator ratio are “incorrect” because 
they ignore the data characteristics. 

The major conclusion from modeling the n/v ratio is that general guidelines are usually 
useless and often misleading. Perhaps providing a general set of guidelines for all data situations 
is not a wise practice. The results of this study showed specific characteristics of the data tended 
to require more observations to variables than generally recommended. Therefore guidelines 
need to be given in the context of other variables and should be situation specific. 

This study provides concrete and useful guidelines regarding the optimal n/v ratio 
needed when applying the standard error scree for determining the number of factors to retain. 
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There is a reason to believe that the conclusions from this study may hold true with different data 
analysis approaches, such as multiple regression, discriminant analysis and others. Therefore, 
this study should be viewed as a pioneer and stimulus for investigating the issue of n/v more 
intensively and with other statistical approaches. Furthermore, the use of logistic regression to 
capture information about relationships among the independent variables of interest and whether 
the correct number of factors are identified provides a model for analyzing data from complex 
simulation studies that makes it very easy to communicate otherwise very complicated 
relationships. 
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Table A-l 

Design Variables for the Sample Data 





Orthogonal Factors 




Oblique Factors fr=.4) 


f 


n v v/f 1 


f 


n v v/f 1 







16 


4:1 


.5 




16 


4:1 


.5 










.8 








.8 




250 


36 


9:1 


.5 


250 


36 


9:1 


.5 










.8 








.8 






48 


12:1 


.5 




48 


12:1 


.5 










.8 








.8 


4 




16 


4:1 


.5 


4 


16 


4:1 


.5 










.8 








.8 




500 


36 


9:1 


.5 


500 


36 


9:1 


.5 










.8 








.8 






48 


12:1 


.5 




48 


12:1 


.5 










.8 








.8 






36 


6:1 


.5 




36 


6:1 


.5 










.8 








.8 




250 


- 48 


8:1 


.5 


250 


48 


8:1 


.5 










.8 








.S 


6 




36 


6:1 


.5 


6 


36 


6:1 


.5 










.8 








.8 




500 


48 


8:1 


.5 


500 


48 


8:1 


.5 










.8 








.8 



Mote. f=number of factors, v=number of indicators, and 1-loading size 
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Table 1 



Percentage of TimesfHit Rate) the SEscree Procedure Determined the Correct Number 



of Factors under each Condition 




Variable 










SEscree 


f 


n v 


v/f 


n/’v 


l 




r=0 


r=.4 




16 


4:1 


15.6 


.5 




98 


59 










.8 




100 


100 




250 36 


9:1 


6.9 


.5 




4 


1 










.8 




99 


98 




48 


12:1 


5.2 


.5 




0 


0 










.8 




7 


4 


4 


16 


4:1 


15.6 


.5 




100 


100 










.8 




100 


100 




500 36 


9:1 


6.9 


.5 




83 


77 










.8 




100 


100 




48 


12:1 


5.2 


.5 




-> 


4 










.8 




100 


99 




36 


6:1 


6.9 


.5 




9 


7 










.8 




98 


99 




250 
















48 


8:1 


5.2 


.5 




0 


0 










.8 




9 


8 


6 


36 


6:1 


6.9 


.5 




82 


79 










.8 




100 


100 




500 
















48 


8:1 


5.2 


.5 




4 


6 










.8 




100 


100 



Note. r=correlation among the factors f=number of factors. v=number of indicators 
v/fHndicator-to-factor ratio, l=degree of loading. SEscree=standard error scree 
procedure. 




13 



Table 2 

Results of the Logistic Regression with Observation-to-Indicator ratio as a Predictor Variable for 



The Standard Error Scree Procedure 



Condition 


Intercept 


b 


_P_ 


rV 


1=8, r=0 


-22.32 


3.82 


0.000 


.85 


1=8, r=.4 


- 20.76 


3.49 


0.000 


.84 


1=5, r=0 


-11.40 


0.92 


0.000 


.68 


1=5, r=.4 


-5.70 


- 5.70 


0.000 


.59 



Note. 1= degree of factor loadings, r=interfactor correlation. 



p 




Fl 9 ure l ' Probability ot determining the correct number of factors by the standard 
error ccree procedure as a function of the n/v Ratio (l=.8, r=0). 



P 




n/v Ratio 

Fi 9 ure 2 - Probability of determing the correct number of factors by the standard 
error scree procedure as a function of the ratio (l®.8, r=,4). 
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p 



r 




Figure 3. probability of determining the correct number of factors by the standard 
error scree procedure as a function of the 0^ ratio (|=.5. r=0). 




P 




Fl ° ure4, Prob *l>lllty of determining the correct number of factors by the standard 
error scree procedure as a function of the n/v ratio (l=. 5 . r«.4). 
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